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Abstract 

Rsim is a switch-level simulator which can simulate large digital MOS in- 
tegrated circuits with speedups of over 3 orders of magnitude over SPICE. Un- 
fortunately, Rsim's simple switched-resistor model renders it incapable of 
simulating certain CMOS and most BiCMOS and ECL digital circuits. We ob- 
serve that the switched-resistor model is just one particular piecewise linear 
model and that Rsim's simulation framework can accommodate more elaborate 
piecewise linear models. The resulting simulator, Mom, combines the ef- 
ficiency of switch-level simulation with the ability to simulate a wider variety 
of circuits. We demonstrate Mom's efficiency and flexibility on a variety of 
circuits. 
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1 Introduction 



The high cost of semiconductor processing makes it desirable to verify the correctness of a large 
custom digital integrated circuit before it is fabricated. Although circuit simulators are usually 
used to analyze small pieces of the design, they can't be used to simulate entire integrated circuits 
(having potentially millions of transistors) because their algorithms are inefficient and require 
execution times which grow superlinearly with circuit size. Nonetheless there is a need to verify 
at least the logical functionality of the entire design to confirm that no errors are made when the 
pieces are assembled. 

To satisfy this need much work has gone into trying to accelerate circuit simulation[l, 4, 9]; 
and speedups of two orders of magnitude over SPICE have been reported. In contrast, we attempt 
to increase the accuracy of a switch-level simulator, Rsim[10]. This approach is attractive because 
switch-level simulators achieve speedups of over 3 orders of magnitude for small circuits. For 
large circuits the speedups are virtually unbounded, being determined by the amount of latency in 
the circuit under simulation. 

Limitations of the switch-level approach must be addressed. Although Rsim is useful for 
predicting the first order behavior of most digital MOS circuits, the simple switched-resistor model 
is inadequate for MOS circuits which are more "analog" in nature (for example RAM sense 
amplifiers) and for most BiCMOS and ECL circuits. We observe that Rsim's switched-resistor 
model is just one particular piecewise linear model and that Rsim can be modified to allow other 
more general piecewise linear models 1 . Multiple models of varying degrees of sophistication are 
provided allowing the user to make different speed vs accuracy tradeoffs for different parts of the 
integrated circuit. Comparisons with existing switch-level and circuit-level simulators reveal that 
our simulator, Mom[5], approaches the speed of switch-level simulators when the simplest transistor 
models are used. When more complex models are used Mom is able to handle sophisticated CMOS, 
BiCMOS and ECL circuits, which are beyond the capabilities of existing switch-level simulators. 

2 Rsim's algorithm 

Since Mom uses the same basic algorithm as Rsim we will briefly review it here. Rsim approximates 
the behavior of MOS transistors using the switched-resistor model (Figure 1). This consists of 
the series combination of a resistor and a voltage controlled switch. If the gate voltage, V g , of an 
NMOS (PMOS) transistor is at a logic level high (low) then the switch is closed and the transistor 
may be replaced by a resistor. Otherwise the switch is open and the transistor is an open circuit. 

When a node changes value (node in in Figure 2) all transistors with a gate attached to the node 
will switch and Rsim must determine the response of all subcircuits containing those transistors. 
The subcircuits, known as "stages" or "clusters", are identified by finding all nodes connected to 
the source or drain of the switching transistor along some path of "on" transistors. In Figure 2 
clusters X and Y need to be analyzed as a result of node in changing (cluster Z will be analyzed 

'Our approach was inspired by Pillage[7] who first suggested the combination of Asymptotic Waveform Evaluation 
with piecewise linear models as the basis of a new kind of circuit simulator. A circuit simulator built upon those 
principles demonstrates speedups over SPICE of a factor of 6[3]. We extend that approach by giving up the full 
generality of a circuit simulation in order to achieve greater efficiency. 
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Figure 1: Switched Resistor Model 
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Figure 2: Clusters. 



when node A changes). Note that clusters X, Y, and Z can be analyzed independently of each other 
and of all other subcircuits in the integrated circuit. The switched-resistor model allows Rsim to 
partition the circuit and take advantage of latency. 

To compute the response of a cluster Rsim analyzes the circuit formed when "on" transistors 
are replaced by resistors and "off" transistors by open circuits (Figure 3). For a typical MOS logic 
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Figure 3: Equivalent Circuit. 

gate the resulting circuit is an RC tree, that is a tree of resistors with the root node grounded and 
capacitors to ground at every other node. This is convenient because the step response of an RC 
tree is well approximated by an exponential with a time constant equal to the first moment. In turn, 
the first moment can be efficiently computed (O(n) complexity) via a depth first traversal of the 
tree. 



3 Piecewise linear models 

Although Rsim's algorithm was described assuming the use of the switched-resistor model, it can 
accommodate more general piecewise linear models. Rsim depends upon two characteristics of the 
switched-resistor model: 1) the unidirectional coupling from the gate to the source and drain allows 
Rsim to partition the circuit and 2) the simplicity of the model permits efficient timing analysis. 
However, more general piecewise linear models can be chosen to retain both characteristics. For 
example, models may have more than two regions of linearity. In that case events are associated 
not only with transitions between the two states: "on" and "off" but also with transitions between 
any two adjacent regions. In addition, the model needn't be a resistor. Rsim's tree analysis can be 
extended to handle device models which include dependent sources. 

More general piecewise linear models can yield substantial improvements over the switched- 
resistor model. One model that has proven to be particularly useful is depicted in Figure 4. Three 
regions of operation are modeled. If the gate-source voltage, V gs is less than a threshold, V t , the 
transistor is off. If V gs > V t and the drain-source voltage is large then the transistor saturated and 
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(a) off (b) linear (c) saturated 



Figure 4: Piecewise Linear MOS Model. 



the current is largely determined by the gate-source voltage (although the output conductance g 0 
models channel length modulation). If V gs > V t and the drain-source voltage is small then the 
transistor is linear and the current depends only on the drain-source voltage. We will refer to this 
model as Mom's Level- 1 MOS model. 

Figure 5 shows the I-V characteristics of this model superimposed over the I-V characteristics 
of a SPICE transistor. The match is good because, for modern short channel devices, velocity 
saturation tends to linearize what would otherwise be a quadratic dependence of the current upon 
the gate-source voltage. 

Figure 6 shows Mom's output for a CMOS ring oscillator using the switched-resistor and Mom's 
Level- 1 MOS model. It can be seen that the Level- 1 MOS model brings a substantial improvement 
in waveform accuracy 2 . 




Figure 5: Piecewise Linear vs SPICE I-V Characteristics: SPICE Level-3 models for MOSIS 1.2// 
Process. 



2 Also apparent from the figure is Mom's use of voltages rather than Boolean values to represent the state of nodes. 
Because Rsim is targeted towards CMOS logic it assumes that all signals swing rail-to-rail and only records whether 
a signal swings to the positive or negative power supply rail. However Mom must also simulate circuits with signals 
which don't swing rail to rail (for example memory sense amplifiers and ECL logic gates). Consequently Mom uses 
voltages to represent node state. 
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Figure 6: CMOS Ring Oscillator: Mom vs SPICE. 



4 Timing analysis 

Piecewise linear models require the use of more sophisticated timing analysis techniques. In 
principle, Rsim's first moment timing analysis could be used. Rsim's techniques for computing 
moments from an RC tree are readily generalized to allow piecewise linear transistor models. The 
response of clusters can then be approximated by (piecewise) exponentials. When we did this we 
discovered that the use of more accurate piecewise linear models yielded timing estimates that were 
worse rather than better. Although an exponential is a good approximation of the step response of 
an RC tree it is not necessarily a good approximation of the response of more sophisticated circuits. 
Consequently Mom employs a more general moments matching technique[2, 8]. Instead of using 
just the first moment to create a waveform approximation consisting of a single exponential, Mom 
also uses additional higher order moments to create a waveform approximation consisting of the 
sum of exponentials. 

Note that the increased computational cost of this timing analysis technique relative to Rsim's is 
mostly due to the estimation of multiple poles rather than the computation of additional moments. If 
the circuit has a tree topology (and most do) then the cost of moment computation rises only linearly 
with the size of the cluster and the number of moments. However, the cost of computing poles rises 
superlinearly with the number of poles. For this reason Mom restricts waveform approximations 
to three or fewer poles. 



5 Demonstration 

We will now demonstrate the flexibility of Mom on a number of CMOS, ECL, and BiCMOS 
circuits which are beyond the capabilities of conventional switch-level simulators. 

The dynamic RAM is an interesting example because although Rsim can simulate most of the 
circuits in the RAM it has problems with the sense amplifier (Figure 7). The sensing phase 
(Figure 8) begins with the two bit lines, bit and bit, charged to slightly different voltages. A rising 
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Figure 7: Dynamic RAM Cell and Sense Amplifier 
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transition on sense turns on the sense amplifier to magnify this voltage difference. When T3 turns 
on, s begins to fall which will cause either Tl or T2 to turn on, depending upon which bit line is 
higher. For example, if bit is higher then T2 will turn on, bit will be pulled to ground, and bit will 
be pulled to Vdd. 

Note that Tl or T2 is turned on by pulling the source terminal low. Because the switched- 
resistor model can only be turned on by pulling the gate terminal high it is incapable of modeling 
the behavior of those two transistors. However, if those transistors are simulated using Mom's 
MOS Level- 1 model, the correct circuit behavior can be obtained. Figure 8 shows plots of the bit 
line waveforms generated by SPICE and Mom for a read, sense, precharge sequence. For Mom's 
simulation the Level- 1 model was only used for Tl, T2, and T3. Everywhere else switched-resistor 
models were employed. Although Mom's response differs from SPICE's, it is adequate for a first 
order verification of the entire DRAM. Table 1 shows that for this example Mom is 250 times faster 





SPICE 


Mom 


SPICE 
Mom 


DRAM Cell 


18.8 


.075 


250 


ECL RAM Cell 


4.2 


.102 


41 


BiCMOS Buffer 


5.1 


.008 


650 


DRAM (21k devices) 




13.900 


>1890 



Table 1: Execution Time of Example Circuits (seconds). 



than SPICE. 

The ECL switch-level simulator, Bisim[6], is based upon tracing paths through current steering 
networks formed by bipolar transistors (Figure 9). Negative current can be thought of as 




Figure 9: ECL Current Steering Networks. 

originating from the current source at the bottom of the network and rising towards the top. When 
the current encounters a node with multiple emitters attached, it is steered through the transistor 
with the highest base. If the current encounters a resistor then it has reached an output and the 
resulting voltage drop causes the output to fall. Thus a simple path tracing algorithm is sufficient 
to determine the behavior of textbook ECL logic gates. 

However under certain circumstances current isn't simply switched between one transistor or 
another but rather is shared. For example, in an ECL RAM (Figure 10) current from a single 
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Figure 10: ECL RAM. 



current source is divided between all cells attached to the bottom word line. Simple path tracing 
algorithms can't determine how current should be shared. Figure 1 1 shows the output of SPICE 
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Figure 1 1 : ECL Ram Cell during Write. 

and Mom during a cell write operation. For this example Mom is 41 times faster than SPICE. 

An increasing number of circuit designs utilize both bipolar and MOS transistors on the same 
chip. Unfortunately neither Rsim nor Bisim can handle these new BiCMOS designs. Figure 12 
shows one variation of the BiCMOS buffer. For this example we trade off waveform accuracy 
for simulation efficiency by selecting the switched-resistor model for the MOS transistors. The 
outputs of SPICE and MOM are compared in Figure 13. A comparison of execution times reveals 
that Mom is 650 times faster than SPICE. 

The preceding benchmarks understate the potential benefit of switch-level simulation because 
they are small and have little latency. The last circuit is a complete 16k bit DRAM (about 21,000 
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Figure 13: BiCMOS Buffer Response. 
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transistors), including the entire array, address decoders, data multiplexors, and control. Because 
Mom partitions the circuit and takes advantage of latency, the circuit can be simulated efficiently. 
Although the complete DRAM is 1400 times larger than the single cell DRAM circuit (in terms 
of transistor count) Mom needs only 185 times as much CPU time (Table 1) to simulate it 3 . For 
this example, Mom's execution time grows sublinearly with the size of the circuit. By comparison, 
SPICE's computational requirements tend to grow superlinearly with circuit size; the complete 
DRAM was too large to simulate using SPICE. However, even if SPICE's execution times were to 
scale linearly with circuit size, Mom would still be over 1 800 times faster. 

6 Performance 

The previous section illustrated the additional flexibility obtained by incorporating piecewise 
linear models into the switch-level framework. However, increased generality usually comes at 
the expense of decreased efficiency. To investigate this issue a number of ring oscillators were 
simulated using SPICE-3d2, Mom, Irsim, and Bisim 4 . Ring oscillators were built using a CMOS 
inverter, CMOS NAND gate, ECL inverter, and BiCMOS buffer at each stage. In addition the 
two CMOS ring oscillators were simulated using both the switched-resistor model ("CMOS0") 
and Mom's Level- 1 MOS model ("CMOS 1"). Many periods of oscillation were simulated in order 
to wash out the effects of simulator initialization. The relative efficiencies of the simulators were 
compared based on the amount of CPU time required to simulate identical numbers of oscillations. 
Table 2 shows the results. The first two columns compute the speedup of Mom over SPICE-3d2 





Ratio CPU times 
SPICE Mom 
Mom (Bi/Ir)sim 


Period 
Error% 


CMOS0 Inverter Ring 


1600 


2.7 


0.4 


CMOS1 Inverter Ring 


80 


53.2 


2.6 


CMOS0 NAND Ring 


2300 


1.5 


28.0 


CMOS1 NAND Ring 


81 


42.0 


4.7 


ECL Ring 


230 


3.3 


16.5 


BiCMOS Buffer Ring 


1400 




10.5 



Table 2: Simulator Performance on Ring Oscillators. 



and the degradation of Mom relative to the switch-level simulators. The last column reports the 
percentage error in Mom's prediction of the period of oscillation relative to SPICE. 

The efficiency of switch-level simulation is evident from the table. Bisim and Irsim are from 760 
to 4200 times faster than SPICE. In addition, Mom's increased generality exacts only a moderate 
performance penalty when switch-level models are employed. For circuits using the MOS switched 

3 The single cell DRAM circuit consists of one column whereas the complete DRAM has 128 columns. Since an 
access activates all 128 columns, the simulation of the complete DRAM involves at least 128 times as much work as 
the simulation of a single column. 

4 SPICE-3d2 is a derivative of the circuit simulator SPICE, and Irsim is a derivative of the MOS switch level 
simulator Rsim. 
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resistor model ("CMOSO Inverter Ring" and "CMOSO NAND Ring") Mom is between 2.7 and 1.5 
times slower than Irsim. For the ECL ring Mom is 3.3 times slower than Bisim. Note that for these 
models the accuracy of Mom is comparable to that of the switch-level simulators. 

As more accurate models are used Mom's efficiency decreases rapidly. When MOS Level- 1 
models are used in the CMOS ring ("CMOS 1 Inverter Ring") Mom slows down by a factor of 
20. In return, Mom achieves increased accuracy. For this pair of circuits the period estimated by 
Mom is off by only 2.6% and 1.2% relative to SPICE. Such low errors are generally beyond the 
capabilities of Irsim and Bisim 5 . 

Execution profiles revealed the source of the speed degradation. When Mom simulated the 
CMOS inverter ring using the switched-resistor models 68% of the execution time was spent in 
timing analysis and 18% of the time was spent rescheduling transistors. However when Mom's 
Level- 1 MOS models were used 25% of the time was spent in timing analysis and 69% of the time 
was spent rescheduling transistors. 

A couple of factors contribute to the increased cost of rescheduling. Devices with greater 
numbers of regions require more checks for region changes. If the switched-resistor model is off it 
is only necessary to check if the model will turn on. In contrast two checks must be made for the 
Level- 1 MOS model. If the model is in the linear region it is necessary to check if the model will 
enter the saturated region or the off region. Additional expense is incurred because waveforms 
consists of sums of exponentials rather than just single exponentials. The root of an equation which 
has a single exponential can be found explicitly. The root of an equation which has the sum of 
three exponentials must be found iteratively. 

7 Conclusion 

We have shown that Rsim's basic switch-level simulation framework can accommodate more 
general piecewise linear transistor models along with the original switched-resistor model. These 
more general models can be incorporated without seriously impairing the simulator's efficiency 
for the simplest cases. That is when the simplest switch-level models are used our simulator, 
Mom, achieves speeds and accuracies comparable to those of dedicated switch-level simulators. In 
addition the more general models give the simulator greater flexibility. Mom can simulate circuits 
that can't be simulated by Rsim or Bisim with substantial speedups over SPICE. 

This approach is particularly well suited for simulating circuits that are just beyond the capabil- 
ities of switch-level simulation. Frequently most of a circuit can be simulated using switch-level 
models and only small portions require more accurate models. Because Mom has been structured 
such that the additional generality is paid for only where it is used it can simulate those circuits 
with only a minor degradation of efficiency. 

Our experiments uncovered some limitations of the approach. Benchmarks show that the cost 
of rescheduling devices rises rapidly as the complexity of transistor models is increased. Unless 
this cost is ameliorated, the approach could lose its speed advantage when models approaching the 
accuracy and generality of SPICE's nonlinear models are used. 

5 The low error of the CMOSO inverter ring is not representative. It occurred only because that circuit was used to 
calibrate the switch-level model. 
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